Policy Improvement: Between Black-Box Optimization and Episodic Reinforcement Learning
نویسندگان
چکیده
Policy improvement methods seek to optimize the parameters of a policy with respect to a utility function. There are two main approaches to performing this optimization: reinforcement learning (RL) and black-box optimization (BBO). In recent years, benchmark comparisons between RL and BBO have been made, and there have been several attempts to specify which approach works best for which types of problem classes. In this article, we make several contributions to this line of research by: 1) Classifying several RL algorithms in terms of their algorithmic properties. 2) Showing how the derivation of ever more powerful RL algorithms displays a trend towards BBO. 3) Continuing this trend by applying two modifications to the state-of-the-art PI algorithm, which yields an algorithm we denote PI. We show that PI is a BBO algorithm. 4) Demonstrating that PI achieves similar or better performance than PI on several evaluation tasks. 5) Analyzing why BBO outperforms RL on these tasks. Rather than making the case for BBO or RL – in general we expect their relative performance to depend on the task considered – we rather provide two algorithms in which such cases can be made, as the algorithms are identical in all respects except in being RL or BBO approaches to policy improvement.
منابع مشابه
Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning
Policy improvement methods seek to optimize the parameters of a policy with respect to a utility function. There are two main approaches to performing this optimization: reinforcement learning (RL) and black-box optimization (BBO). Whereas BBO algorithms are generic optimization methods that, due to there generality, may also be applied to optimizing policy parameters, RL algorithms are specifi...
متن کاملCombined Optimization and Reinforcement Learning for Manipulation Skills
—This work addresses the problem of how a robot can improve a manipulation skill in a sample-efficient and secure manner. As an alternative to the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic control cost function; 2) A black-box return functio...
متن کاملGeneralized Reinforcement Learning for Manipulation Skills – Combining Low-dimensional Bayesian Optimization with High-dimensional Motion Optimization
This paper addresses the problem of how a robot can autonomously improve a manipulation skill in an efficient and secure manner. Instead of using the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic cost function; 2) A black-box reward function; 3)...
متن کاملMirror Descent Search and Acceleration
In recent years, attention has been focused on the relationship between black box optimization and reinforcement learning. Black box optimization is a framework for the problem of finding the input that optimizes the output represented by an unknown function. Reinforcement learning, by contrast, is a framework for finding a policy to optimize the expected cumulative reward from trial and error....
متن کاملTaking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control
In this work we introduce the application of black-box quantum control as an interesting reinforcement learning problem to the machine learning community. We analyze the structure of the reinforcement learning problems arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a general method to solving...
متن کامل